Fault-Tolerance with Multimodule Routers
نویسندگان
چکیده
The current multiprocessors such as Cray T D support interprocessor communication using partitioned dimension order routers PDRs In a PDR implemen tation the routing logic and switching hardware is par titioned into multiple modules with each module suit able for implementation as a chip This paper proposes a method to incorporate fault tolerance into such routers with simple changes to the router structure and logic The previously known fault tolerant routing methods assume centralized crossbar based routers and are not applicable to multiprocessors with PDRs The proposed technique works for convex fault model using only local knowledge of faults Using the proposed techniques and as few as four virtual channels per physical channel torus networks with PDRs can handle faults without compromising deadlock and livelock freedom Simulations for dimensional torus and mesh networks show that the resulting fault tolerant PDRs have performances similar to those of the crossbar based routers
منابع مشابه
Fault-Tolerant Communication with Partitioned Dimension-Order Routers with Complex Faults
ÐThe current fault-tolerant routing methods require extensive changes to practical routers such as the Cray T3D's dimension-order router to handle faults. In this paper, we propose methods to handle faults in multicomputers with dimension-order routers with simple changes to router structure and logic. Our techniques can be applied to current implementations in which the router is partitioned i...
متن کاملAn efficient routing methodology to tolerate static and dynamic faults in 2-D mesh networks-on-chip
The move towards nanoscale Integrated Circuits (ICs) increases performance and capacity, but poses process variation and reliability challenges which may cause several faults on routers in Networks-on-Chips (NoCs). While utilizing healthy routers in an NoC is desirable, faulty regions with different shapes are formed gathering faulty routers. Fault regions can be used to lead the fault-tolerant...
متن کاملAdaptive multimodule routers
International Conference on High Performance Computing, pp. 342-347, December 1997 Abstract. Recent multiprocessors such as Cray T3D support interprocessor communication using partitioned dimension-order routers (PDRs). In a PDR implementation, the routing logic and switching hardware is partitioned into multiple modules, with each module suitable for implementation as a chip. This paper propos...
متن کاملMixed-Criticality Systems based on a CAN Router with Support for Fault Isolation and Selective Fault-Tolerance
In many application domains there is an increasing trend for mixed-criticality systems with functions of different assurance levels on shared computing platforms. Today’s CAN-based platforms do not support the requirements of mixed-criticality systems. A single CAN bus provides low cost, real-time support and flexibility for applications where the communication service is not safety-relevant. F...
متن کاملA Resolving Set based Algorithm for Fault Identification in Wireless Mesh Networks
Wireless Mesh Networks (WMN s) have emerged as a key technology for next-generation wireless networking. By adding some Long-ranged Links, a wireless mesh network turns into a complex network with the characteristic of small worlds. As a communication backbone, the high fault tolerance is a significant property in communication of WMN s. In this paper, we design a novel malfunctioned router det...
متن کامل